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Spatial  methods  for  the  analysis  of  agricultural  field 
experiments  are  represented  here  as  smoothing  methods  applied 


simultaneously  with  the  estimation  of  treatment  effects.  Selection  of 


both  the  form  of  the  smoother  and  the  degree  of  smoothing  required  may 
be  based  on  cross-validation.  Particular  emphasis  is  placed  in  this 


paper  on  generalized  least-squares  estimation  in  linear  models,  but 
the  principle  applies  quite  generally. 
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SIGNIFICANCE  AND  EXPLANATION 


One  of  the  oldest  problems  in  the  application  of  statistics  to 
experimental  data  is  the  disentangling  of  the  effects  of  environ¬ 
mental  variation  from  those  of  the  treatments,  whose  comparison  is  the 
purpose  of  the  experiment.  In  the  context  of  agricultural  field 
experiments  the  environmental  variation  is  due  to  local  fluctuations 
in  soil  type,  nutrient  concentrations  and  micro-climate,  summarised  by 
an  unknown  level  of  fertility  that  varies  from  experimental  plot  to 
plot. 

Traditionally  the  statistical  analysis  of  yield  data  from  such 
experiments  has  employed  block-based  methods,  which  are  ideal  only 
under  the  unrealistic  assumption  that  the  fertility  level  is  constant 
between  abrupt  breaks  at  prescribed  block  boundaries. 

Recently  there  has  been  much  interest  in  adjusting  for  fertility 
in  a  more  continuous  manner,  recognising  that  the  true  fertility  pro¬ 
file  can  probably  be  more  realistically  represented  as  a  smooth 
surface.  But  the  various  methods  proposed  are  based  on  differing 
statistical  principles,  so  that  comparison  between  methods  has  been 
difficult. 

The  present  paper  facilitates  such  comparison  by  demonstrating 
that  all  the  techniques  suggested,  in  particular  those  based  on  least- 
squares  analysis  of  a  linear  model,  may  be  viewed  as  smoothing  methods 
applied  simultaneously  with  the  estimation  of  additive  treatment 
effects.  Comparison  of  methods  reduces  to  a  study  of  the  behaviour 
and  performance  of  the  corresponding  smoothers. 

It  is  suggested  that  cross-validation  be  used  both  to  choose  the 
degree  of  smoothing  required,  and  to  select  the  appropriate  smoother. 
Algebraic  results  are  presented  that  ease  the  numerical  computations 
involved  in  this  method  for  a  number  of  important  special  cases. 


^  The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 

summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


LINEAR  MODELS  FOR  FIELD  TRIALS,  SMOOTHING, 

AND  CROSS-VALIDATION 

Peter  J.  Green 

1  .  Introduction 

The  aim  of  many  agricultural  field  experiments  is  to  estimate 
treatment  contrasts  efficiently  whilst  avoiding  bias  due  to  trends  in 
fertility  and  other  environmental  factors.  Blocking  methods  are 
customarily  used,  even  when  blocks  have  no  physical  meaning  in  the 
experiment,  but  there  has  recently  been  increasing  interest  in 
adjusting  for  trends  in  a  more  continuous  way  leading  to  so-called 
'spatial'  or  'neighbour'  methods  that  deliberately  exploit  the  spatial 
context . 

An  early  example  is  the  method  of  adjustment  using  residuals  from 
neighbouring  plots  due  to  Papadakis  (1937);  see  also  Bartlett  (1978). 
Succeeding  developments  have  been  fostered  by  increased  general 
interest  in  spatial  methods  and  by  enhanced  computing  power.  Various 
recent  proposals  appear  in  the  innovative  paper  by  Wilkinson,  Eckert, 
Hancock  and  Mayo  (1983)  and  its  accompanying  stimulating  discussion. 

The  intention  of  this  paper  is  to  increase  understanding  of  the 
proposed  methods,  and  to  aid  their  comparison,  by  representing  them 
all  as  smoothing  methods,  whether  they  were  originally  conceived  as 
such  (e.g.  Green,  Jennison  and  Seheult,  1983,  1985),  or  derived  from 
explicit  spatial  stochastic  models  (e.g.  Besag,  1977),  assumed  co- 
variance  structure  (e.g.  Williams,  1985b)  or  other  principles  (e.g. 
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Papadakis;  Wilkinson,  et  al.) 


We  first  demonstrate  that  the  generalized  least  squares  analysis 
of  any  linear  model  is  a  smoothing  method.  While  this  embraces  only 
some  of  the  spatial  methods  mentioned,  it  provides  an  important  link 
with  classical  block-based  analysis.  Indeed,  for  incomplete  block 
designs,  Yates's  analysis  with  recovery  of  inter-block  information 
(1939,  1940)  may  be  regarded  as  a  prototype  example  for  this  dis¬ 
cussion.  The  smoothing  interpretation  extends  to  analyses  not  based 
on  least-squares.  Later  we  discuss  the  choice  or  estimation  of  the 
tuning  constant  or  variance  ratio  controlling  each  method,  advocating 
the  criterion  of  cross-validation  for  this  choice;  this  criterion  also 
provides  a  means  of  selecting  and  assessing  the  method  itself. 

Design  will  not  be  considered  here;  no  attempt  is  made  at  justi¬ 
fication  or  robustification  via  randomization  theory,  so  design  plays 
no  explicit  role  in  analysis.  It  will  of  course  affect  efficiency; 
for  discussion,  see  Martin  (1982)  and  Williams  (1985a). 

2.  Generalized  least  squares 
Consider  the  linear  model 

2 

E(y)  =  DT  +  Rp;  var(y)  =  0  V($)  (2.1) 

for  the  vector  of  yields  y  from  an  experiment  on  n  plots  to  com¬ 
pare  a  single  set  of  t  treatments.  Here  D  is  the  design  matrix 
for  treatment  effects  x  and  R  that  for  any  other  fixed  effects  p 
to  be  fitted;  we  assume  that  the  complete  design  matrix  X  =  [D  :  R] 

has  full  rank.  The  spatial  context  is  represented  by  appropriate 

2 

choice  of  the  variance  matrix  0  V($)  which  is  non-singular  and 


assumed  known  apart  from  the  multiplier  0  and  the  parameter 
usually  scalar. 

The  presence  of  <j>  complicates  an  otherwise  trivial  estimation 
problem.  If  $  and  hence  V  are  known,  generalized  least  squares 
leads  to  the  estimating  equations 

>v  [y  -  xgi  -  0 

from  which  p  may  be  eliminated  to  yield  the  reduced  equations  for 
T  alone : 

DT(I  -  S) (y  -  DT)  =  0  (2.2) 

where  (de  Hoog,  Speed  and  Williams,  1985) 

-1  -1  T  -1  -  T  -1 

S  =  I  -  V  +  V  R(R  V  R)  R  V 

,  (2.3) 

=  I  -  ((I  -  PR)V(I  -  PR)1 

Here  I  is  the  n  *  n  identity  matrix,  and  for  any  matrix  A,  A~ 

denotes  any  generalized  inverse,  A+  the  Moore-Penrose  inverse,  and 

PA  the  projector  A(ATA)-AT,  which  is  invariant  to  the  choice  of 

generalized  inverse.  The  second  form  for  S  emphasizes  that  not  all 

of  V  need  be  specified,  only  the  result  after  sweeping  out  the  fixed 

effects  in  R  (see  example  2  below). 

Since  V  depends  on  4> ,  so  do  S  and  the  generalized  least- 

squares  estimate  T;  these  will  be  denoted  S(<|>),  T(<|>)  for  emphasis 

where  necessary.  If  the  model  is  correct  and  4>  known  this  is  an 

2 

efficient  analysis,  and  o  may  be  estimated  to  quantify  the  pre¬ 
cision  of  i:  however,  the  least-squares  principle  does  not  of  itself 
lead  to  an  estimate  of  4> . 
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In  the  analysis  of  incomplete  block  designs,  with  recovery  of 

inter-block  information  (see  Cochran  and  Cox,  1957,  chapters  9  -  11), 

2  2 

the  assumed  variance  matrix  is  0V{$)=0(I+  here  Z  is  the 

design  matrix  for  blocks,  so  for  constant  block  size  k,  Pz  = 

«.  1  m 

k  ZZ  .  This  gives  an  error  structure  with  two  uncorrelated 

2 

components:  plot  error  with  variance  0  ,  and  random  block  effects 
2  -1 

with  variance  o  $k  .  If  the  design  is  resolvable,  R  is  taken  to 
be  the  design  matrix  for  fixed  replicate  effects:  otherwise,  R  is  a 
single  column  of  1 's  to  fit  an  overall  mean  only.  Mote  that  some 
authors  (for  example,  Nelder  (1968))  assume  instead  that  replicate 
effects  are  random. 

Example  2 .  Least-squares  smoothing. 

Green,  et  al  (1983,  1985)  describe  a  method  of  analysis  for  field 
experiments  derived  from  smoothing  based  on  a  quadratic  penalty 
function.  The  version  providing  one-dimensional  adjustment  is  equiva¬ 
lent  to  a  generalized  least  squares  regression  of  Ay  on  AD  with 
o  (X  I  +  AA  )  as  assumed  variance  matrix  for  Ay,  where  A  is  a 
rectangular  matrix  taking  second  differences  along  lines  of  adjacent 
plots,  and  X  is  a  tuning  constant.  The  analysis  is  invariant  to 
linear  trends  within  lines  of  adjacent  plots,  thus  we  take  R  to  be  a 

design  matrix  for  separate  linear  regressions  in  each  such  line,  D 

T 

fits  all  treatment  contrasts,  and  V  is  any  matrix  such  that  AVA  = 

—  It  —  i 

X  I  +  AA  .  The  tuning  constant  X,  or  X  ,  plays  the  role  of  <j> . 

This  model  was  also  proposed,  independently,  by  Nelder  in  discussion 


of  Wilkinson,  et  al  (1983). 


Other  forms  of  least-squares  smoothing  are  discussed  by  Green 
et  al.  One  is  equivalent  to  the  linear  variance  analysis  of  Patterson 
(in  discussion  of  Wilkinson,  et  al  (1983))  and  Williams  (1985b),  which 
uses  V  as  above,  but  where  A  takes  first  differences  between 
adjacent  plots  in  the  same  replicate. 
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T 

(y  -  DT)  (I  -  S)(y  -  DT)  and  hence  to  T.  For  if  (3.2)  followed  by 

*  *  m 

(3.3)  updates  t  to  x  ,  then  ft(x  )  =  fl(i)  -zl(I  +  S)z  where  z  = 
PD(I  -  S)(y  -  DT);  convergence  is  obtained  if  (I  -  S)  is  non¬ 
negative  definite  and  (I  +  S)  is  positive  definite,  and  these  are 
true  for  S  of  (2.3).  (Further,  replacing  S  by  aS  +  (1  -  a)I 
does  not  affect  the  solution  to  (2.2),  so  that  adjusting  a  may 
increase  the  speed  of  convergence:  it  is  fastest  when  a  “  2. ) 

Generalized  least-squares  estimates  may  thus  be  obtained  by 
alternately  performing  an  ordinary  least-squares  regression  (3.3)  on 
(y  -  C)  and  smoothing  the  residuals  from  fitted  treatment  effects 
(3.2).  We  term  S  a  'smoother'  because  its  eigenvalues  lie  in 
[0,1],  with  not  all  of  them  equal  to  0  or  1. 

For  the  example  of  incomplete  block  designs,  V  =  I  +  <j>P_,  V-1  = 

I  -  (1  +  4>)”%PZ,  PZR  =  R,  so  S  =  (1  +  <J')~1(PR  +  Th«s  the 

relevant  'smoother'  involves  a  weighted  average  of  the  block  means  and 
the  overall  mean  (or  replicate  means  in  the  resolvable  case).  In 
neighbour  methods,  S  corresponds  more  closely  to  the  intuitive 
notion  of  smoothing. 

We  should  clarify  the  status  of  £.  In  the  model  (3.1)  this 

represents  a  'trend'  term  incorporating  both  fixed  and  random 

effects.  Under  the  additional  assumption  of  joint  Normality  for 

£  and  n,  the  conditional  expectation  of  £  given  y  is 

( I  -  V  1 ) (y  -  DT )  +  V  1Rp,  whose  generalized  least-squares  estimate 

S(y  -  DT )  is  produced  by  the  alternating  iteration  described  above. 

This  generalizes  ridge  regression  (Hoerl  and  Kennard,  1970):  if 

2-1  T 

T  and  p  are  absent,  and  £  has  prior  variance  0  \  WW  , 


then 


£  =  W8  where  8  is  estimated  by  (WTW  +  Al)  %J^y. 

2  2  T 

When  the  variance  matrix  is  o  V  =  o  (I  +  E<t>,W.W.),  where  {6.} 

ill  ri 

and  {w^}  are  known  (one  or  more  variance  components  besides  white 

noise) ,  an  alternative  representation  as  a  smoothing  problem  is 

possible.  Taking  {w^}  to  be  of  full  rank,  we  have  implicitly: 

y  =  DT  +  Rp  +  2)Wi8i  +  h  (3.4) 

where  t  and  p  are  fixed  effects,  and  {8^}  and  h  are  un- 

2 

correlated  zero-expectation  random  effects  with  var(8^)  =  <p^0  I  and 
2 

var(n)  =  0  I .  (The  identity  matrices  may  be  of  different  orders.) 
Minimization  of  the  penalty  function 

c  =  £4>-16t0  .  +  nTn  (3.5) 

ill 

A 

subject  to  the  additive  model  (3.4)  leads  again  to  the  estimate  T. 
Yet  another  equivalent  formulation  is  to  minimize  the  ordinary  error 
sum-of-squares  h  n  subject  to  (3.4)  and  upper  bounds  on  * 

A  very  similar  approach  to  smoothing  is  often  followed  in  non- 
parametric  regression  problems  (Wahba  and  Wold,  1975;  Wahba,  1977). 
Here  the  model  would  be  y^  =  ^(t^)  +  and  one  possible  penalty 

function  is  f(5"(t))^dt  + 

1  l 

Natural  points  of  departure  for  generalizing  the  least-squares 
smoother  are  the  simultaneous  equations  (3.2,  3.3)  or  the  penalty 
function  (3.5).  There  is  no  need  for  S  to  be  symmetric  for  (3.2) 
and  (3.3)  to  solve  (2.2),  so  that  asymmetric  linear  estimating  equa¬ 
tions  such  as  tho^e  of  Wilkinson  et  al  (1983)  may  be  included. 
Papadakis's  method,  whether  iterated  or  not,  also  fits  this  formu¬ 
lation  naturally.  As  suggested  by  Green,  et  al  (1985),  alternative 
robust/ resi stant  analyses  may  be  obtained  by  use  of  a  non-linear 
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smoother  or  treatment  estimator  in  (3.2,  3.3),  or  by  amending  the 
quadratic  loss  function  (3.5). 

4.  Choosing  4> 

Since  $  cannot  be  estimated  by  least-squares  principles,  a  wide 
variety  of  methods  for  choosing  its  value  have  been  proposed.  Yates's 
original  proposal  for  incomplete  block  designs  (1939,  1940)  entailed 
equating  two  suitably  chosen  sums  of  squares  to  expectation,  an 
approach  also  adopted  by  Williams  (1985b).  Alternative  estimators  for 
block  designs,  based  on  Normal-theory  likelihood  methods  were  given  by 
Nelder  (1968)  and  Patterson  and  Thompson  (1971).  We  return  to  these 
criteria  in  Section  5.  For  Bayesian  viewpoints,  see  Lindley  and  Smith 
(1972)  and  Box  and  Tiao  (1973,  chapter  7). 

An  attractive,  less  model-dependent,  alternative  is  to  use  the 
criterion  of  cross-validation,  as  described  by  Stone  (1974).  The  idea 
is  to  treat  each  observation  in  turn  as  'missing'  and  to  'predict'  it 
from  the  model  as  fitted  to  the  remaining  observations  for  each  given 
value  of  <j>.  The  parameter  <j>  is  then  chosen  to  minimize  the  mean 
squared  error  of  prediction.  Considerable  use  has  been  made  of  cross- 
validation  and  related  techniques  in  recent  work  on  smoothing  (e.g. 
Wahba  and  Wold  (1975),  Craven  and  Wahba  (1979)  and  Silverman  (1985)). 
In  Stone's  terminology,  the  term  'model'  is  abandoned  as  conveying  a 
richer  meaning  than  intended,  and  replaced  by  that  of  a  'prescription' 
or  class  of  predictors. 

Our  prescription  involves  minimizing  the  weighted  sum-of-squares 
(y  -  DT  -  Rp)TV(4>)_1(y  -  DT  -  Fp) 
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or,  equivalently, 

(y  -  Dt)T(I  -  S  ( 4> )  )<y  -  DT  ) 

over  choice  of  t,  p  (thereby  estimating  these)  and  choice  of  the 
'missing'  component(s)  of  y.  This  is  equivalent  to  fitting  a  dummy 
covariate  for  each  missing  observation,  gives  generalized  least- 
squares  estimates  based  on  the  available  data,  and  provides  the  con¬ 
ventional  'missing  value  formula'  in  the  case  of  uncorrelated  data. 

Introducing  the  dummy  design  matrix  E  for  missing  observations 
gives  the  augmented  model 

E(y )  =  Dt  +  Rp  +  ETy ,  var(y)  =  0^(4 >) 

The  reduced  estimating  equations  for  the  cross-validation  errors  Y 
are 

ETM(y  -  EY)  =  0 

where 


-1  -1  T  -1  -  T  -1 

M  =  M(  <j> )  =  V  -  V  X(X  V  X)  X  V 


-  (1  -  S)  -  (I  -  s)d(dt(i  -  S)D)"dT(X  -  S)  (4.1) 


■  1(1  -  Vr,V(I  -  PD,R>) 


Here,  PD  R  is  the  projector  for  the  partitioned  matrix  [D  i  R]  .  In 
particular,  if  a  single  observation,  y^  say,  is  deemed  to  be 
missing,  E  consists  of  a  single  column  of  0's  with  1  in  row  i, 
and  the  prediction  error  Y  =  Y^j,  say,  is 

n 

Y  ( i ,  =  (ETME)"1ETMy  =  [  ^  M^Y^jAii  * 

The  cross-validation  mean  squared  error  is  therefore 


_ *.  -  a  J  «.  - 
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■*.  *  f&t* •'  .* 
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?able  3 


Simulation  for  least  squares  smoothing: 


10th  and  50th  percentiles  of  percentage  efficiency  for 
5  criteria  for  choice  of  $ . 


tesigns  (each  48  plots  in  one  line) 
t=4 ) 

2321434121342123432414132312424313142 
2  1  4  3  4  1  2  1  3 

( t*  1 6 ) 
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examine  the  criteria  marginally:  an  appropriate  measure  of  the 

* 

efficiency  of  a  particular  choice  $  is 

v  A  2 

min  l  (t  <$)-T  ) 

..i-JrJ-i _ 3 . 

t  a  a  n 

l  <*.,<♦  ) “T  ) 2 
j-1  3  3 

where  {t^}  are  the  true  values,  here  zero.  Note  that  100% 
efficiency  cannot  be  attained.  For  each  criterion,  the  empirical 
distribution  of  efficiency,  from  the  100  replicates,  was  con¬ 
structed.  Some  of  the  results  are  presented  in  Table  3.  They 
demonstrate  rather  close  agreement  between  the  criteria,  and  suggest 
no  clear  preferences.  The  superiority  of  generalized  cross-validation 
over  the  ordinary  version  found  by  Craven  an*.  Wahba  ( 1979)  is  not 
apparent  here,  presumably  because  of  the  well-conditioned  nature  of 
these  designed  experiments. 

Since  the  other  criteria  are  less  readily  adapted  to  a  variety  of 
smoothing  methods,  especially  those  not  derived  from  least-squares, 
these  results  support  the  use  of  cross-validation  for  choice  of 
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algebraic  results  in  Sections  4,  5  and  6.  The  form  of  V($)  assumed 
was  that  corresponding  to  ordinary  least-squares  smoothing  (example  2 
in  Section  2).  Three  factors  were  varied  in  the  experiment. 

(I)  The  true  variance  structure  for  y:  three  alternatives, 
in  each  case  white  noise  n  plus  correlated  £ ,  all 
jointly  Normally  distributed  with  (a)  first  differences  or 
(b)  second  differences  of  £  independent  and  identically 
distributed,  or  (c)  £  as  a  7-point  equally  weighted 
moving  average. 

(11)  The  true  roughness,  measured  by  $true  = 

(n£TATA£)/{(n-2)nTn):  two  alternatives,  .02  and  .2. 

(Ill)  The  design  (and  number  of  treatments):  two  alternatives, 
both  with  n  3  48  plots:  one  adapted  from  a  serially 
balanced  design  on  4  treatments,  the  other  a  triple 
lattice  with  3  replicates  of  16  treatments,  block  size 
4. 

For  each  of  the  resulting  12  ■  3  *  2  x  2  cases,  100  replicates 
were  performed:  £  and  n  were  drawn  independently  from  (I)  and 
scaled  according  to  (II),  then  the  sum  y  *  £  +  h  analysed  by  least- 
squares  smoothing  assuming  design  (III).  Thus  true  treatment  effects 
were  set  to  zero. 

Five  criteria  were  compared:  ordinary  and  generalized  cross- 
validation,  and  three  methods  suggested  in  Green,  Jennison  and  Seheult 
(1985):  ’Tukey's  rule’,  in  which  yTMy/(tr(M) )2  is  minimized,  and 

full  and  restricted  maximum  likelihood.  The  resulting  choices  of  4* 
were  extremely  highly  correlated.  It  was  therefore  sufficient  to 


TP 


r 
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Table  2 

Choice  of  $  in  second-difference  least-squares  smoothing:  data-sets  as 
described  and  analysed  in  Green,  et  al  (1985). 


Data  Set 


Mildew 

SB  77  ES  5 

SB  77  ES  6 

n,t 

38,4 

51,17 

48,16 

Estimate 

Yates 

0.11 

0.0162 

0.0603 

Tukey 

1.86 

0 

0.0613 

Restricted  maximum  likelihood 

2.79 

0.0041 

0.0591 

Generalized  cross-validation 

3.06 

0.0051 

0.0809 

Cross-validation 

3.40 

0.0022 

0.1113 

Maximum  likelihood 

5.27 

0.0238 

0. 1915 
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.  Application  to  least-squares  smoothing 

The  spectral  decomposition  of  M  given  in  (5.1)  reduces  the 
computation  in  least-squares  smoothing  methods,  whether  in  the  form 
based  on  second  differences  as  discussed  in  detail  by  Green,  et  al 
(1985)  or  in  the  generalizations  described  here. 

All  of  the  methods  for  choosing  the  tuning  constant  $  1 
proposed  by  Green,  et  al  involve  computing  the  decomposition 

AAA  ^  J 

y  =  Dt  +  5  +  n  for  several  values  of  $  •  The  spectral 

decomposition  (5.1),  while  incurring  a  set-up  cost,  permits  the 
decomposition  and  various  derived  statistics  to  be  computed  very 
cheaply  for  subsequent  values  of  $  1 • 

When  V((j>)  =  I  +  <|>WWT,  note  that  V-1  -  I-W(<J>_1  1+WTW)_1WT  + 

I-Pw  as  <J>  +  ".  The  fixed  effects  T,  p  in  model  (2.1)  are 
estimated  by  Ty  where  T  =  T($)  *  (XTV"'^X )“^XTV~ * .  Note  that  M  * 

V~ 1 ( 1-XT) .  Now 

T(«)  ( I -M ( <f> )  )  =  (XT(I-Pw)X)~1XT(I-Pw)(I-V(«|>)~1(I-XT(lj»)))  -  T ( 4> ) 

* 

since  (I-Pw)  (I-V~ ')  =  0.  Thus  if  we  write  h  =  n($)  =  M((|))y,  the 

A  A 

decomposition  of  y-h(<|>)  as  DT($)  +  £($)  is  obtained  by  linear 
transformations  not  depending  on  only  M($)  need  be  re-computed 

for  each  and  that  from  the  spectral  decomposition  (5.1). 

Numerical  examples  of  choices  of  <j>  are  given  in  Table  2:  the 
data-sets  are  those  used  by  Green,  et  al  (1985). 

7 .  A  simulation  study 

To  compare  the  use  of  cross-validation  in  choosing  $  with  other 


criteria,  we  performed  a  simulation  experiment,  making  use  of  the 


To  derive  an  analogue  of  Yates'  estimate  for  $  valid  for 

T 

general  known  W  in  V  =  I  +  $WW  ,  note  that  in  consequence  of 

(5.1)  Sx  and  Sx  w  are  the  limits,  as  4>  0  and  “  respectively, 

T 

of  the  weighted  error  sum-of-squares  I*($)  =  y  My  = 
t  T  ~  1  i 

min  l(y-DT-Rp)  V  (y-DT-Rp)}  that  would  be  the  focus  of  attention 
i  ,  P 

if  $  were  known.  Equating  I*(0)  and  lim^^rf^)  to  expectation 
under  model  (2.1)  therefore  seems  the  natural  analogue,  although  this 
procedure  is  not  as  cheap  and  convenient  as  in  the  incomplete  blocks 
case,  because  the  expectations  in  general  involve  non-trivial  trace 
terms.  Further,  if  D,  R  and  W  together  span  8°,  e.g.  least- 
squares  smoothing  with  first  or  second  differences,  F($)  ♦  0  as 
<J>  -►  <*>.  In  this  situation,  Williams  (1985b)  tacitly  uses 
lim^^  <|>r(<fr)  as  the  second  sum  of  squares:  from  (5.1)  we  see  that 
this  equals  yTNy,  where  N  *  ((I  -  PX)WWT(I  -  Px)]+.  Provided 
that  (I  -  PV)(I  -  PR)  =  0,  the  result  of  de  Hoog,  et  al  (1985) 
stated  in  (2.4)  remains  true  when  V  is  singular,  if  V  1  is 
replaced  by  V  .  Thus,  replacing  V  by  WW  and  R  by  X,  we  see 
that  if  (I  -  PW)(I  -  Px)  =  0  we  have 

w  rn  rn  fr  _  rri 

N  =  ATA  -  A*AX(XT  A  AX)  XXA  A 


=  A*(I  -  P,X)A 


if  A  is  chosen  so  that  (WWT)+  =  ATA;  for  example  A  «*  (WTW)-1WT 
if  W  has  full  rank.  Thus  yTNy  is  the  residual  sum-of-squares  from 
ordinary  regression  of  Ay  on  AX. 
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Table  1 


Balanced  Incomplete  block  designs: 


estimate  of  $  for  four  data  sets,  according 
to  five  different  criteria. 


Data  set 


Davies 

Davies 

Quenouille 

John 

Ditto , 

(1954, 

(1954, 

(1953, 

(1971, 

fitting 

p207 ) 

p2 16 ) 

p177) 

p226) 

replicates 

12,4,3 

20,5,4 

30,6,3 

36,9,3 

36,9,3 

Estimate 


Jensen  and  Stone 

(1976)* 

13.08 

-0.6357 

2.938 

-0.4444 

- 

Yates  (1940) 

21.40 

-0.5033 

3.320 

0.1348 

-0.1685 

Cross-validation 

as 

21.40 

-0.5033 

3.446 

0.1975 

-0. 1685 

described  here 

Nelder  (1968) 

21.40 

-0.5033 

3.355 

0.2102 

-0. 1685 

Stein  (1966)* 

68.32 

0.2640 

9. 123 

0.4303 

_ 

Notes:  Where  $  is  estimated  as  negative,  it  would  be  customary  to  use 

<J>  =  0,  i.e.  to  use  no  block  adjustment.  The  rows  marked  *  were 
given  for  these  data  sets  by  Jensen  and  Stone.  Nelder's  estimates  are 
identical  with  those  of  Patterson  and  Thompson  (1971)  since  the  blocks 
are  of  equal  size. 


T 

squares^  y  (I  -  PXfW)y  ”  sx,W'  and  b^oc,cs  sum-of-squares,  within 

T 

replicates  if  relevant,  adjusted  for  treatments,  y  (px,w  "  px>y  * 

SX  ~  SX,W 

The  requirement  of  equality  of  the  non-zero  eigenvalues  {X^}  is 
strong.  It  demands  considerable  implicit  symmetry  in  the  design;  for 
example,  it  does  not  apply  to  all  balanced  incomplete  blocks  designs. 
If  WW  is  proportional  to  a  projector  for  some  factor,  the 
requirement  is  of  first-order-balance  for  this  factor  with  respect 
to  X,  in  the  sense  of  James  and  Wilkinson  (1971). 

Some  numerical  examples  are  given  in  Table  1  for  the  same 
collection  of  BlBD's  considered  by  Jensen  and  Stone  (1976)  in  their 
application  of  cross-validation  to  these  designs.  Note  the  exact  or 
close  agreement  between  cross-validation  and  the  criteria  of  Yates 
(1940),  Nelder  (1968)  and  Patterson  and  Thompson  (1971).  when  the 
eigenvalues  {X^}  differ,  the  exact  connection  is  broken;  however  if 
they  do  not  differ  too  markedly,  the  argument  above  suggests  approxi¬ 
mate  equivalence  of  the  two  criteria. 

It  will  be  noted  from  Table  1  that  Jensen  and  Stone  (1976) 
obtained  different  weights  from  a  cross-validation  argument  based  on  a 
different  prescription.  They  obtain  separate  predictors  for  the 
intra-  and  inter-block  extremes  of  $("  and  -1)  regarding  an  entire 
block  as  missing  in  deriving  the  inter-block  predictor.  The  Stein 
estimates  differ  considerably  from  the  rest,  apparently  because  they 
utilize  inter-block  information  only  through  the  treatment  com- 

a  ^ 

ponents  Stein’s  estimator  (1966)  uses  only  t(-1),  T(»)  and  Sxw. 
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Since  the  two  terms  are  symmetric  and  commute,  they  may  be  simul¬ 
taneously  diagonalized  (Wilkinson,  1965,  p.  52)  and  then  using  the 
idempotence  of  I  -  Px  it  is  straightforward  to  prove  the  existence 
of  an  orthogonal  matrix  U  partitioned  as  [U^  U2  Uj] ,  where 
has  r^  columns,  and  a  diagonal  matrix  A  of  positive  eigenvalues 
U^}  such  that  Px  =  U^,  (I  -  PX)WWT(1  -  Px)  = 

U2  A  U2  and  I  -  Px  w  =  U3U3*  Then 

M(  4> )  =  U2(I  +  4>A)-1U2  +  U3U2  (5.1) 

so  that  M  has  eigenvalues  0  and  1  with  multiplicities  r1  and 
r3,  and  also  (d  +  >  1 ,  j  =  1,2,...,r2).  Note  that  this 

explicitly  demonstrates  how  M  varies  from  (I  -  Px)  to 
(I  -  Px>w)  as  ^  increases  from  0  to  °»j  for  example,  in  the 
incomplete  blocks  model  these  limits  corresponding  to  ignoring  block 
effects,  and  to  fitting  them  as  fixed,  respectively.  The  decompo¬ 
sition  used  here  is  essentially  that  used  by  Patterson  and  Thompson 
(1971)  for  block  models. 

It  follows  from  (5.1)  that  (d/d$)M  =  <|>  1  (M2  -  M)  ,  so  that  by 
the  remarks  at  the  end  of  the  previous  section,  generalized  cross- 
validation  is  equivalent  to  equating  to  expectation  yTMry  for  r  = 

2  and  3,  and  restricted  maximum  likelihood  similarly  for  r  =  1 
and  2. 

When  the  {A^}  are  equal,  all  positive  powers  of  M  are  convex 
combinations  of  I  -  Px  and  1  -  px,w  so  that  both  criteria  are 
algebraically  equivalent  to  Yates's  estimate  of  4>  for  incomplete 
block  designs  (1939,  1940)?  in  our  notation,  with  W  =  Pz,  this 
entails  equating  to  expectation  the  intra-block  residual  sum-of- 


structure  V ( 4> ) ,  presumably  in  practice  from  a  small  number  of 


alternatives.  Coupled  with  cross-validatory  assessment  of  that  choice 
(Stone,  1974),  this  may  be  the  only  reasonable  way  to  choose  between 
methods  on  the  basis  of  an  individual  ^ata  set,  rather  than,  for 
example,  from  uniformity  data  believed  to  have  similar  covariance 
structure . 

Mr.  Robin  Thompson  has  pointed  out  to  me  an  interesting  parallel 
between  the  present  approach  and  that  of  restricted  maximum  likeli¬ 
hood,  which  may  be  stated  in  some  generality  as  follows.  Differenti¬ 
ating  (4.3)  with  respect  to  <)>  and  noting  that  MX  =  0  and  MVM  = 

M  reveals  that  generalized  cross-validation  is  equivalent  to  equating 
to  their  expectation  under  (2.1)  certain  sums-of-squares ,  namely 
yTM2y  and  its  derivative,  or  for  vector  <j>  all  partial  deriva¬ 
tives.  The  restricted  maximum  likelihood  approach  of  Patterson  and 
Thompson  (1971),  generalized  to  arbitrary  V,  does  the  same  but 
with  M  replaced  by  M.  Some  numerical  comparisons  will  be  made  in 
Section  5  and  6. 

5 .  A  spectral  decomposition 

We  now  restrict  attention  to  variance  structures  of  the  form  V  = 
T 

I  +  4>WW  for  some  known  matrix  W,  representing  plot  error  with  one 
other  variance  component:  $  is  the  ratio  of  variances.  This 
includes  the  incomplete  blocks  model,  and  least-squares  smoothing 
based  on  first  or  second  differences. 

From  (4.1)  we  have 

M+  =  (I  -  PX)V(1  -  Px)  =  (I  -  Px)  +  <(>  ( I  -  PX)WWT(1  -  Px)  . 

1 1 


P 
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I 


I 


c<*>  = n_1  i  \L)2  =i  nn 


i=  1 


i=  1  j=1 


i]  ]' 


(4.2) 


Thus,  by  analogy  with  Wahba  (1977)  and  Craven  and  Wahba  (1979), 
there  is  an  algebraic  form  for  C($)  that  can  be  computed  without 
performing  n  separate  regression  calculations.  However,  there  is 
still  usually  a  much  greater  burden  in  evaluating  M(<|>)  than  in 
finding  T  (<{>).  Some  short-cuts  are  possible  (see  Sections  5  and  6; 
also  Craven  and  Wahba  (1979)),  but  to  alleviate  the  problem,  and  to 
acquire  a  form  of  rotation-invariance,  Wahba  (1977)  proposed  an 
alternative  criterion  of  generalized  cross-validation,  derived  by 
replacing  Mi;i  in  (4.2)  by  its  average,  n-1tr(M),  to  give 

G ( 0 )  =  n(tr(M))-2  yTM2y  .  (4.3) 

Under  certain  conditions  of  balance  or  symmetry  in  designed  experi¬ 
ments,  the  Mii  are  equal.  Choice  of  ij>,  whether  scalar  or  vector¬ 
valued,  can  be  made  in  practice  by  numerical  minimization  of  (4.2)  or 
(4.3):  for  scalar  <j>,  we  have  found  that  both  golden-section  and 
quadratic  interpolation  on  log  $  work  well. 

In  the  context  of  non-parametric  regression,  Silverman  (1984) 
uses  results  of  Utreras  (1980,  1981)  to  amend  the  criterion  further, 
by  calculating  the  trace  from  eigenvalue  approximations. 

When  using  a  smoother  S  not  of  the  form  (2.3),  these  algebraic 
simplifications  are  not  available,  but  the  principle  of  cross- 
validation  may  still  be  used.  The  prescription  must  define  how  to 
smooth  across  the  gap  caused  by  a  missing  observation. 

A  logical  extension  to  cross-validation  for  selecting  the 
parameter  4>  is  to  allow  it  to  choose  the  form  of  the  variance 
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